Update Video Encoder and tests for 6 container formats #913

Dan-Flores · 2025-09-30T14:39:09Z

This PR updates the VideoEncoder to support encoding for common video container formats, nearly identically to the FFmpeg CLI.

Changes:

Some changes are made to align with the design in #907:

Rely on avcodec_find_best_pix_fmt_of_list to select the best pixel format for the default codec
Add crf as an option on the C++ side to enable round trip tests

Testing

test_video_encoder_round_trip: Ensures that a video's decoded frames are the same after encoding then decoding.
- mov, mp4, mkv, webm
test_video_encoder_against_ffmpeg_cli: Ensures that the VideoEncoder frames are the same as the FFmpeg CLI.
- mov, mp4, avi, mkv, webm, flv, gif

Testing caveats

The crf parameter is needed to test lossless encoding in the round trip test. For formats that do not support crf, the round trip test is not availablle.
When lossy encoding occurs due to codec + pixel format selection, assert_close is substituted by assert_tensor_close_on_at_least with a lower percentage match (96-99), and a higher atol (2-15).

Dan-Flores · 2025-10-07T15:58:11Z

src/torchcodec/_core/Encoder.cpp

+    avCodecContext_->global_quality = FF_QP2LAMBDA * qp;
  }
+  int status = avcodec_open2(avCodecContext_.get(), avCodec, &options);
+  av_dict_free(&options);


Above, the crf parameter is reused to set qscale to encode high quality videos in round-trip tests. But, the C++ function only allows crf to be set, not qscale. Since qscale is not needed anywhere else, I did not think it was worth including, but I am open to feedback here.

Thanks for the context.

My understanding is that for those codecs that do not support crf, we set instead the qscale (quantizer scale) parameter. They both control encoding quality, but in different ways.

I think... we should avoid doing that. I don't have a good enough understanding of how these 2 parameters (and their values!) relate to each other, and I think we can punt on that for a first release of the encoder. Especially since we only really need this workaround for our round-trip test to run. It means we won't be able to do the run-trip tests on those formats, but that's OK:

those formats aren't that popular anyway

we should still be able to do the test against the FFmpeg CLI.

NicolasHug · 2025-10-10T10:06:16Z

test/test_ops.py

+        assert (
+            source_frames.shape == round_trip_frames.shape
+        ), f"Shape mismatch: source {source_frames.shape} vs round_trip {round_trip_frames.shape}"
+        assert (
+            source_frames.dtype == round_trip_frames.dtype
+        ), f"Dtype mismatch: source {source_frames.dtype} vs round_trip {round_trip_frames.dtype}"


Just use plain assert like

assert source_frames.shape == round_trip_frames.shape

pytest will provide a proper error message

NicolasHug · 2025-10-10T10:09:34Z

test/test_ops.py

+            atol = 2
        # Check that PSNR for decode(encode(samples)) is above 30
        for s_frame, rt_frame in zip(source_frames, round_trip_frames):
            res = psnr(s_frame, rt_frame)
            assert res > 30


I realize it's not from this PR but let's clean that up a little:

the comment isn't needed as the code is really self explanatory (and it doesn't just do psnr validation anymore!)

no need to store res

Suggested change

atol = 2

# Check that PSNR for decode(encode(samples)) is above 30

for s_frame, rt_frame in zip(source_frames, round_trip_frames):

res = psnr(s_frame, rt_frame)

assert res > 30

atol = 2

for s_frame, rt_frame in zip(source_frames, round_trip_frames):

assert psnr(s_frame, rt_frame) > 30

NicolasHug · 2025-10-10T10:09:47Z

test/test_ops.py

+            assert_close(s_frame, rt_frame, atol=atol, rtol=0)
+
+    @pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")
+    @pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")


It's duplicated

Suggested change

@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")

NicolasHug · 2025-10-10T10:14:31Z

test/test_ops.py

+                    "Codec for webm is not available in the FFmpeg6/7 installation on Windows."
+                )
+        asset = TEST_SRC_2_720P
        # Test that decode(encode(decode(asset))) == decode(asset)


This comment should be at the top

NicolasHug · 2025-10-10T10:14:48Z

test/test_ops.py

+            f.write(source_frames.permute(0, 2, 3, 1).cpu().numpy().tobytes())
+
+        ffmpeg_encoded_path = str(tmp_path / f"ffmpeg_output.{format}")
+        # Test that lossless encoding is identical


Not sure if this comment is needed here, it looks out of place

NicolasHug · 2025-10-10T10:17:07Z

src/torchcodec/_core/FFMPEGCommon.cpp


+const AVPixelFormat* getSupportedPixelFormats(const AVCodec& avCodec) {
+  const AVPixelFormat* supportedPixelFormats = nullptr;
+#if LIBAVCODEC_VERSION_INT >= AV_VERSION_INT(61, 13, 100)


Please add comment to specify which major FFmpeg version this correspond to

NicolasHug · 2025-10-10T10:22:54Z

src/torchcodec/_core/Encoder.cpp

+    // av_packet_rescale_ts ensures encoded frames have correct timestamps.
+    // This prevents "no more frames" errors when decoding encoded frames,
+    // https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46


https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46 links to

if (packet->duration == 0 && codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) { // 1 means that 1 frame (in codec time base, which is the frame rate) // This has to be set before av_packet_rescale_ts bellow. packet->duration = 1; }

which seems to be about the lines just above.

Is this comment at the right place? Maybe it should be a few lines above - and it should also explain why we need to set duration to 1 ?

I moved the comment to be above the packet->duration and av_packet_rescale_ts lines and updated the text to reference it as "the code below". Let me know if I misunderstood your suggestion

NicolasHug · 2025-10-10T10:24:21Z

src/torchcodec/_core/Encoder.h

  UniqueEncodingAVFormatContext avFormatContext_;
  UniqueAVCodecContext avCodecContext_;
  int streamIndex_ = -1;
+  AVStream* avStream_;


IIUC we now store avStream_ mostly because we need to access time_base? If that's the case, then let's get rid of the streamIndex_ field because it can now be accessed through avStream_

meta-codesync · 2025-10-10T23:01:34Z

@Dan-Flores has imported this pull request. If you are a Meta employee, you can view this in D84393092.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2025

Dan-Flores force-pushed the encoder_accuracy branch 3 times, most recently from f414d0b to c29dee3 Compare October 6, 2025 15:39

Dan-Flores marked this pull request as ready for review October 7, 2025 14:51

Dan-Flores commented Oct 7, 2025

View reviewed changes

NicolasHug reviewed Oct 10, 2025

View reviewed changes

Daniel Flores added 16 commits October 10, 2025 13:55

check in testsrc mp4 and all_frames info

56a96fb

Use testsrc in round trip test

f09fe35

add func to access pix_fmts in FFmpeg7+

5c94fda

fix 6 container fmts

f438729

pass crf variable in custom ops

1162beb

Add tolerances for various cases

ea85cfe

clean up logging, comments

f0fffca

delete unused resource

444254e

remove whitespace

4bac987

adjust test order

75c5b36

remove positional arg

796499e

Make crf optional

2cafa10

crf default

1aebfec

name args?

49d85d6

windows + webm test skips

4ab1b63

compare against cli, high % match

5f2928f

Daniel Flores added 6 commits October 10, 2025 13:55

skip webm/windows/ffmpeg6,7

2055291

test gif against ffmpeg cli

e0e456c

more sensible qscale lower bound

d2b2f14

incorporate comment suggestions in test_ops

d7bb786

move torchaudio comment, remove streamIndex_

266f9f5

remove qscale, adjust tests

4516b35

Dan-Flores force-pushed the encoder_accuracy branch from fe8fb87 to 4516b35 Compare October 10, 2025 19:34

Update Video Encoder and tests for 6 container formats #913

Are you sure you want to change the base?

Update Video Encoder and tests for 6 container formats #913

Uh oh!

Conversation

Dan-Flores commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes:

Testing

Testing caveats

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dan-Flores commented Sep 30, 2025 •

edited

Loading

NicolasHug Oct 10, 2025 •

edited

Loading

NicolasHug Oct 10, 2025 •

edited

Loading